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ABSTRACT 

In 2007, Comparative Fungal Genomics Platform 
(CFGP; http://cfgp.snu.ac.kr/) was publicly open 
with 65 genomes corresponding to 58 fungal and 
Oomycete species. The CFGP provided six bioinfor- 
matics tools, including a novel tool entitled 
BLAST Matrix that enables search homologous 
genes to queries in multiple species simultaneously. 
CFGP also introduced Favorite, a personalized 
virtual space for data storage and analysis with 
these six tools. Since 2007, CFGP has grown to 
archive 283 genomes corresponding to 152 fungal 
and Oomycete species as well as 201 genomes 
that correspond to seven bacteria, 39 plants and 
105 animals. In addition, the number of tools in 
Favorite increased to 27. The Taxonomy Browser 
of CFGP 2.0 allows users to interactively navigate 
through a large number of genomes according to 
their taxonomic positions. The user interface of 
BLASTMatrix was also improved to facilitate subse- 
quent analyses of retrieved data. A newly developed 
genome browser, Seoul National University Genome 
Browser (SNUGB), was integrated into CFGP 2.0 to 
support graphical presentation of diverse genomic 
contexts. Based on the standardized genome ware- 
house of CFGP 2.0, several systematic platforms 
designed to support studies on selected gene 
families have been developed. Most of them are 
connected through Favorite to allow of sharing 
data across the platforms. 



INTRODUCTION 

Fungal genome sequencing has rapidly increased since the 
release of the genome sequences of Saccharomyces 
cerevisiae in 1996 (1). With the current and anticipated 
advances in sequencing technology (2,3), the rate of 
fungal genome sequencing will continue to accelerate. 
Currently, there exist more than 300 fully sequenced 
fungal genomes in the public domain (4,5), with many 
species and several isolates of previously sequenced 
species being sequenced (6,7). In addition, the 1000 
Fungal Genome project (F1000; http://l 000. fungal 
genomes.org/) will greatly help us to uncover genomic 
underpinnings of fungal evolution and life styles via 
large-scale comparative genomics studies. In combination 
with genomes from plants and animals as well as fungi, 
in-depth comparative genomics across multiple eukaryotic 
kingdoms will be facilitated (8-10). To efficiently support 
such large-scale, genome-based inquiries, it is critical to 
archive the available genome sequences and annotation in- 
formation in a standardized format so that they can be 
easily and efficiently retrieved and analyzed. 

To address this need, in 2007, the first version of 
Comparative Fungal Genomics Platform (CFGP) was 
released with 65 fungal and Oomycete genomes (11). 
The CFGP was founded on a new user interface (UI) 
called Data-driven User Interface (DUI), which made 
use of its bioinformatics tools and the management of 
task histories easy and efficient. Since then, the number 
of genomes archived and bioinformatics tools have grown 
substantially. (Supplementary Table SI and Table 1) 
Furthermore, the standardized genome data warehouse 
of CFGP has supported the development of multiple plat- 
forms that are specialized for supporting the archiving and 
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Table 1. List of bioinformatics tools available in CFGP 2.0 
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Sequences 
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nn 
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ID of functional 


InterPro Scan 
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Phylogenetic 
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Alignment 
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Alignment 
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Alignment 


(15) 
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Alignment 
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Alignment 
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ID of secreted 


SignalP 3.0 
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proteins 
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Sequences 
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Sequences 


(201 




SecretomeP 


Sequences 
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Subcellular 
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Sequences 




localization 


predictNLS 


Sequences 
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Sequences 


(24) 




TargetP 


Sequences 


(25) 


Prediction ot 
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Sequences 


(2b) 


trans-membrane 








helices 








Prediction of 


tRNAScan-SE 


Sequences 


(27) 


RNA secondary 


mFold 3.2 


Sequences 


(28) 


structure 








Post-translational 


NetCGlyc 


Sequences 


(29) 


modification 


NetNGlyc 


Sequences 






NetOGlyc 


Sequences 


(30) 




NetPhos 


Sequences 


(31) 


Conserved motif 


MEME 


Sequences 


(32) 


search 








A total of 27 tools in nine categories 
browser of CFGP 2.0. 


are available via 


the Favorite 



analysis of specific gene families and functional groups 
(Table 2). Some of these platforms share the Favorite of 
CFGP to provide an efficient mechanism for sharing data 
with CFGP and to enable the use of its bioinformatics 
tools for a variety of analyses. In this article, we outline 
the improvements made in CFGP 2.0 and how its 
standardized genome warehouse has been exploited in 
development of other comparative genomics platforms. 

METHODS 

System design 

The structure and core databases of CFGP 2.0 are basic- 
ally identical to those used for the first version. The system 
consists of databases including a standardized genome 
warehouse, wrapper programs written by the Perl and C 
languages and DUI. To balance the server load so as to 
ensure a more efficient operation of the system, its core 
databases were distributed in multiple servers, and more 
web servers were added. The MySQL relational database 
management system was used to manage and curate the 
data. Its web interfaces were written in PHP with 
javascript, and analysis functions in Favorite Browser 



Table 2. List of online platforms and tools supporting studies on 



specific gene families or 


functional groups" 




Name 


URL 


Reference 


Cyber-infrastructure 


http://www.fusariumdb.org/ 


(33) 


for Fusarium 






Fungal Transcription 


http://ftfd.snu.ac.kr/ 


(34) 


Factor Database 






Fungal Cytochrome 


http://p450.riceblast.snu.ac.kr/ 


(35) 


P450 Database 






Seoul National 


http://genomebrowser.snu.ac.kr/ 


(36) 


University Genome 






Browser 






The Systematic 


http://pimp.starflr.info/ 


(37) 


Platform for 






Identifying Mutated 






Proteins 






Insect Mitochondrial 


http://www.imgd.org/ 


(38) 


Genome Database 






Fungal Secretome 


http://fsd.snu.ac.kr/ 


(39) 


Database 






Eukaryotic DNAJ/K 


http://edd.snu.ac.kr/ 




Database 






Cell Wall-degrading 


http://www.cwde.org/ 




Enzyme Database 







"These platforms were developed using the standardized genome 
warehouse and the web template engine of CFGP 2.0. 



were relayed by Perl scripts and automatically coordinated 
by the system monitoring servers. 

Mining orthologs 

The source code of InParanoid 4. 1 was used for the iden- 
tification of orthologs in the archived proteomes. First, 
the genomes of 35 species that are frequently utilized 
were subjected to ortholog identification (Supplementary 
Table SI). All pairwise comparisons of data from these 
35 species were carried out. The latest version of 
InParanoid 7 provides orthologs from 100 eukaryotic 
genomes. However, some genomes that we have used 
were not included in the latest version; data for those 
species were downloaded from CFGP 2.0 and subsequently 
subjected to ortholog identification. 

EXPANDED GENOME DATA WAREHOUSE 

The standardized genome warehouse of CFGP has been 
substantially expanded in both the number of species and 
taxonomic coverage. In addition to 283 genomes corres- 
ponding to 152 fungal and Oomycete species, 39 plant and 
105 animal genomes have also been archived (Figure 1 and 
Supplementary Table S2). The animal and plant genomes 
were incorporated to enable comparative evolutionary 
genomics studies across multiple eukaryotic kingdoms. 

ENHANCED UTILITY AND NEW FEATURES 

Improved UI 

The UI of CFGP 2.0 was greatly improved to provide 
better user experience. All modifications followed the 
HTML5 and CSS3 standards, on which most widely used 
web browsers are based. Three main application/utility 



D716 Nucleic Acids Research, 2013, Vol. 41, Database issue 



CFGP2.0 



Gene Family Databases 



Data-driven User Interface 



Favorite Browser 



27 Bioinformatics tools 



5NU Genome Browser 



Database of 484 Genomes 

-□omycota 





FTFD 

Transcription factors 



P450 

Cytochrome P450 genes 



FSD 

Secreted proteins 



EDD 

DHAJ.K genes 



CWDE 

Cell Wall-degrading Enzymes 



Web-based Systems 



IMGD 
Mitochondrial Genome DB 



SysPIMP 

Mutated human proteins 



CiF 

Fusariurn Database 



Figure 1. A diagram illustrating the system architecture and the content of the genomes archived in CFGP 2.0. Key features of CFGP 2.0 were 
depicted on the left. The web-based platforms that have been developed based on the standardized genome warehouse of CFGP 2.0 are listed on the 
right. Bidirectional arrows indicate that they support the Favorite Borwser, which synchronizes with CFGP 2.0. Dashed arrows denote that SNUGB 
was integrated in two platforms, FSD and EDD. In the pie graph, the inner and outer circles represent the number of genomes and species for each 
taxon, respectively. 
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Figure 2. Outline of the improved DUI of CFGP 2.0. The frames outlined in blue, orange and purple correspond to the 
Favorite, presentation and application frame, respectively. The buttons boxed in red are to hide or show Favorite and presentation frames, 
respectively. 
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frames were rearranged to make the switch from one frame 
to another more intuitive. The Favorite and presentation 
frames can be toggled for more flexible browsing (Figure 2). 
All web pages have been thoroughly tested with multiple 
web browsers, including Chrome, Firefox, Internet 
Explorer and Safari. 

Taxonomy browser 

As the number of species representing diverse taxa con- 
tinues to increase, a browsing tool based on simple text 
search (e.g. species name) was not sufficient to help users 
to efficiently explore available genomes. The Taxonomy 
Browser implemented in CFGP 2.0 provides predictive 
text feature as well as a hierarchical tree-based taxon 



structure to show the taxonomic position of a chosen 
species. Once a specific species is selected, the number of 
genomes and available sequence types are listed with 
direct links to corresponding sequences. 

Seoul National University Genome Browser 

The first version of CFGP did not offer a graphical inter- 
face to present the genomic context and notable features 
such as GC content, functional domains and the signal 
peptide. This new genome browser implemented in 
CFGP 2.0 enabled users to view such information in the 
chosen region. The target region of viewing can be selected 
by assigning the start and end positions with a mouse or 
by typing in its genome coordinate (Figure 3). 
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Figure 3. A screenshot of SNUGB implemented in CFGP 2.0. SNUGB allows users to view 13 biological features and the gene structures in the selected 
genome region. Those features include GC contents, functional domains, nuclear localization signals, signal peptides and trans-membrane helixes. 
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New bioinfoi unities tools added to the Favorite Browser 

Compared with the first version that only provided six 
tools, CFGP 2.0 is equipped with 27 tools covering nine 
categories of data analysis or viewing (Table 1). This 
addition enables users to perform more analyses without 
leaving CFGP. 

Ortholog browsing function 

Finding orthologs for a specific gene in multiple species 
often requires numerous BLAST searches and validation 
processes. In the first version of CFGP, we tried to elimin- 
ate copy-and-paste of sequences by incorporating the DUI 
(1 1). In CFGP 2.0 we simplified the identification and col- 
lection of orthologs by offering pre-computed data. There 
are several ortholog identification programs such as 
InParanoid (40), Ortholog, MSOAR (41) and THOR 
(42). We adopted InParanoid to identify orthologs via 
pairwise comparisons among 35 frequently accessed 
genomes. For every protein sequence encoded by each of 
these 35 genomes, orthologous genes in the other 
34 genomes are provided to allow a quick overview of 
its distribution among these species and also to support 
their further analyses by saving them into a Favorite on 
the fly. 

FUNCTIONAL/EVOLUTIONARY GENOMICS 
PLATFORMS DEVELOPED BASED ON 
THE STANDARDIZED GENOME 
WAREHOUSE OF CFGP 2.0 

Via the use of the standardized genome warehouse of 
CFGP 2.0, a number of platforms that aim to support 
comparative analyses of specific gene families and/or func- 
tional groups have been developed: (i) Cyber-infra- 
structure for Fusarium (CiF; http://www.fusariumdb 
.org/) (33), (ii) Fungal Transcription Factor Database 
(FTFD; http://ftfd.snu.ac.kr/) (34), (hi) Fungal 
Cytochrome P450 Database (FCPD; http://p450.riceblast 
.snu.ac.kr/) (35), (iv) Fungal Secretome Database (FSD; 
http://fsd.snu.ac.kr/) (39), (v) Eukaryotic DNAJ and 
DNAK Database (EDD; http://edd.snu.ac.kr/) (Cheong 
et al., manuscript in preparation) and (vi) Cell 
Wall-degrading Enzymes Database (CWDE; http://www. 
cwde.org/) (Choi et al., manuscript in preparation). The 
Seoul National University Genome Browser (SNUGB) 
(http://genomebrowser.snu.ac.kr/) (36) was also imple- 
mented in FSD and EDD. The Insect Mitochondrial 
Genome Database (IMGD; http://www.imgd.org/) (38) 
employs the Species-driven UI, which enables intuitive 
and fast taxonomical browsing with multiple add-on 
analysis functions. Finally, the Systematic Platform for 
Identifying Mutated Proteins (SysPIMP; http://pimp. 
starflr.info/) (37) was developed to support the identifica- 
tion of mutations related to human diseases. The Favorite 
Browser of CFGP 2.0 is connected with many of those 
platforms to efficiently support data exchange and 
sharing across multiple platforms. All the data saved in 
the Favorite Browser are synchronized in real-time so that 
users can fully exploit data and functions provided by 
these platforms. 



FUTURE DIRECTIONS 

To keep up with the rapidly released and updated eukary- 
otic genomes, CFGP 2.0 will be updated on a regular 
basis. We will integrate more useful modules, software 
or interface scheme to continuously improve the environ- 
ment for users conducting comparative and evolutionary 
genomics studies. In order to support efforts to uncover 
possible functions of many hypothetical genes, the ortho- 
log information database will be expanded by adding the 
corresponding information from more species. 

SUPPLEMENTARY DATA 

Supplementary Data are available at NAR Online: 
Supplementary Tables 1 and 2. 
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